Published online 13 October 2011 



Nucleic Acids Research, 2012, Vol. 40, No. 3 1131-1147 

doi:10.1093/nar/gkr834 



Genome sequencing of the lizard parasite 
Leishmania tarentolae reveals loss of genes 
associated to the intracellular stage of 
human pathogenic species 

Frederic Raymond 1 , Sebastien Boisvert 1 , Gaetan Roy 1 , Jean-Francois Ritt 1 , 
Danielle Legare 1 , Amandine Isnard 4 , Mario Stanke 5 , Martin Olivier 4 , 
Michel J. Tremblay 1 ' 3 , Barbara Papadopoulou 1,3 *, Marc Ouellette 1 ' 3 '* and 
Jacques Corbeil 1 ' 2 '* 

infectious Disease Research Centre, CHUL Research Centre (CHUQ), department of Molecular Medicine, 
3 Department of Microbiology and Immunology, School of Medicine, Universite Laval, Quebec City, 
department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada and 
5 lnstitut fur Mathematik und Informatik, University of Greifswald, Greifswald, Germany 



Received July 7, 2011; Revised and Accepted September 20, 2011 



ABSTRACT 

The Leishmania tarentolae Parrot-Tarll strain gen- 
ome sequence was resolved to an average 16-fold 
mean coverage by next-generation DNA sequencing 
technologies. This is the first non-pathogenic to 
humans kinetoplastid protozoan genome to be 
described thus providing an opportunity for com- 
parison with the completed genomes of pathogenic 
Leishmania species. A high synteny was observed 
between all sequenced Leishmania species. A 
limited number of chromosomal regions diverged 
between L. tarentolae and L. infantum, while re- 
maining syntenic to L. major. Globally, >90% of the 
L. tarentolae gene content was shared with the 
other Leishmania species. We identified 95 pre- 
dicted coding sequences unique to L. tarentolae 
and 250 genes that were absent from L. tarentolae. 
Interestingly, many of the latter genes were ex- 
pressed in the intracellular amastigote stage of 
pathogenic species. In addition, genes coding for 
products involved in antioxidant defence or part- 
icipating in vesicular-mediated protein transport 
were underrepresented in L. tarentolae. In contrast 
to other Leishmania genomes, two gene families 
were expanded in L. tarentolae, namely the zinc 
metallo-peptidase surface glycoprotein GP63 and 



the promastigote surface antigen PSA31C. Overall, 
L. tarentolae's gene content appears better adapted 
to the promastigote insect stage rather than the 
amastigote mammalian stage. 

INTRODUCTION 

Leishmania is an early-branching unicellular eukaryote 
that belongs to the Kinetoplastida order and the family 
Trypanosomatidae. Leishmania species are transmitted by 
the bite of female phlebotomine sand flies as extracellular 
flagellated metacyclic promastigotes and replicate in mam- 
malian macrophages as intracellular aflagellated amasti- 
gotes. Leishmania infections represent a global health 
problem with 350 million people at risk, an annual inci- 
dence of 2 million and an overall prevalence estimated at 
12 million people worldwide (1). At least 20 Leishmania 
species cause a large spectrum of clinical manifestations 
ranging from self-resolving skin lesions (L. major) to 
mucocutaneous manifestations (L. braziliensis) reaching 
to life-threatening visceral diseases (L. donovanij 
L. infantum). 

Through phylogenetic analyses, Leishmania was divided 
in three distinct subgenera: the Leishmania, the Viannia 
and the Sauroleishmania (2). The classification of lizard 
Leishmania as a distinct genus was once debated (3,4) 
but the molecular evidence did not support this assump- 
tion. Leishmania (Sauroleishmania) tarentolae was first 
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isolated from the lizard Tarentola mauritanica in 1921 (5) 
and is probably the most widely studied Leishmania 
(S aur o le is h mania) species. In lizards, the parasites live 
predominantly as promastigotes in the lumen of the 
cloacae and intestine or in the bloodstream (6). 
Amastigotes, either free or inside monocytes, are rarely 
observed in lizards (4,6), although both free promastigotes 
and amastigotes from the blood were reported (7). The 
ability of L. tarentolae to develop into amastigote forms 
in the lizard is still debated but, as for several Leishmania 
species infecting lizards (6), L. tarentolae is able to enter 
human phagocytic cells and differentiate into amastigote- 
like forms. However, there is no clear evidence for their 
efficient replication within macrophages (8-10). Because 
of its rapid growth in defined media and lack of patho- 
genicity to humans, L. tarentolae has been used widely as a 
model organism for studies on gene amplification (11-13) 
or RNA editing (14). Furthermore, L. tarentolae has been 
used as a platform for the production of recombinant 
proteins (15) and as a potential vaccine candidate (9,10). 

The genomes of two Leishmania (Leishmania) species, 
L. major and L. infantum, and one Leishmania (Viannia) 
species, L. braziliensis, have already been completed and 
annotated (16,17), and more are being sequenced (www. 
tritrypdb.org). The 32.8 Mb genome of L. major clone 
Friedlin spreads over 36 chromosomes and is presently 
the best annotated Leishmania genome (16). A 5-fold shot- 
gun sequencing was provided for L. braziliensis clone 
M2904 and L. infantum clone JPCM5. The genomes of 
the various Leishmania species contain a similar number 
of genes, estimated at 8200 (Table 1). Despite the 20-100 
million years of divergence within the Leishmania genus, a 
recent sequence comparison of the genomes of L. major, 
L. infantum and L. braziliensis has revealed a strong con- 
servation of gene content and synteny across the genus 
(17). Comparative genomics of L. major, L. infantum 
and L. braziliensis have shown that approximately 200 
genes were differentially distributed between the species 
and that only 78 genes were unique to one species (17). 
Notable differences were observed in L. braziliensis, which 
contains a putative RNA interference pathway and two 
types of transposons (TATES) and retroposons (SLACS) 
absent in the two other species. 

Here, we sequenced and assembled de novo the 
L. tarentolae strain Parrot-Tarll using next-generation 



high-throughput DNA sequencing technologies. The com- 
parison of the L. tarentolae genome with the published 
genomes of pathogenic Leishmania species revealed a 
high degree of synteny. We identified 95 predicted coding 
sequences unique to L. tarentolae and 250 genes present 
in the pathogenic species but absent in L. tarentolae. 
Interestingly, several of the genes lacking from 
L. tarentolae are expressed preferentially in the intracellu- 
lar amastigote stage of pathogenic species. This may ex- 
plain in part why L. tarentolae is less well adapted to infect 
human macrophages and why these parasites are mostly 
reported as free organisms in the lizards. 

MATERIALS AND METHODS 

Sample preparation and sequencing 

Promastigotes of L. tarentolae (strain Parrot-Tarll) were 
grown up to the late log phase in SDM-79 (Schneider's 
Drosophila medium) supplemented with 5 ug/ml hemin 
and 5% heat-inactivated fetal calf serum (FCS) 
(Multicell, Wisent Inc.). High molecular weight genomic 
DNA was extracted from parasites after chemical lysis 
with SDS (1%) followed by two sequential phenol extrac- 
tions, proteinase K and RNase A treatments. 
Chromosomal DNA was further purified on cesium chlor- 
ide gradients to limit the extent of kinetoplastid DNA in- 
clusion (18). Purified L. tarentolae genomic DNA was 
sequenced with next-generation sequencing technologies 
using paired (2.5 kb inserts) and unpaired GS-FLX or 
Titanium sequencing procedures (Roche 454). GS-FLX 
and Titanium sequencing were performed at the McGill 
University and Genome Quebec Innovation Centre, 
Montreal, Canada. Genomic sequences using unpaired 
Solexa/Illumina sequencing were performed at the 
Netherlands Cancer Institute. Sequencing runs are 
described in Supplementary Table SI. 

Assembly 

Sequences produced on the GS-FLX and Titanium se- 
quencers were assembled using Newbler 2.0 (Roche). 
The N 50 scaffold length of the L. tarentolae assembly 
was 61 619 bp. Sequences produced by Solexa/Illumina 
were assembled with the Velvet version 0.7.55 software 
(19). Additional validation of the assembly was performed 



Table 1. Summary of sequenced Leishmania spp. genomes 





Leishmania 


Leishmania 


Leishmania 


Leishmania 




tarentolae 


major (V5.2) 


infantum (V3a) 


braziliensis (V2) 


Strain 


Parrott-Tarll 


Friedlin 


JPCM5 


WHOM/BR/75/M2904 


Number of chromosomes 


36 


36 


36 


35 


Genome size (bp) 


30 440719 


32816678 


32134935 


32 005 207 


Overall G+C content (%) 


57.2 


59.7 


59.3 


57.8 


Coding G+C content (%) 


58.4 


62.5 


62.5 


60.4 


Number of coding genes in database (time of study) 


8201 


8304 


8216 


8133 


Number of OGs 


7449 


7530 


7506 


7353 


Mean nucleotide identity with L. tarentolae (%) 




84.9 


85.0 


79.2 


Mean amino acid identity with L. tarentolae (%) 




81.9 


82.4 


74.8 


Reference 


Current study 


(16) 


(17) 


(17) 
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using our own de novo assembler Ray (20). The assemblies 
were merged using the minimus2 scaffolder found in the 
AMOS package version 2.0.8 (21). Chromosome-like scaf- 
folds were created by comparing gene order in the de novo 
assembled scaffolds and contigs to the gene order in the 
version 5.2 of the reference L. major genome using BLAST 
and custom python scripts. 

Annotation 

Gene identification was performed by comparing the 
assembled L. tarentolae genome sequence to the putative 
genes of L. infantum, L. major and L. braziliensis using 
BLAST 2.2.21 (22). Sequences from the assembly were 
compared to the coding sequences (CDS) of the three 
other species and all L. tarentolae open reading frames 
(ORFs) larger than 100 nucleotides were translated and 
compared to the gene sequences of the three other species. 
Similar analysis was also performed on T. cruzi and 
T. brucei. Another set of genes was predicted with 
Augustus using evidence from protein homology to 
Trypanosomatidae and an ab initio model trained for 
L. tarentolae (23-25). A consensus set of genes was com- 
piled by pooling the two annotation sets, which were 
filtered in order to identify the most probable genes in 
the L. tarentolae sequence. All putative genes were 
mined for domain structures and motifs using HMMER 
2.0 and the pfam database (26,27). Further gene ontology 
(GO) analyses were performed using Blast2GO (28). 
Additional phylogenetic analyses were conducted in 
MEGA4 (29). Synteny maps were drawn using the R 
software (http://www.r-project.org/) based on chromo- 
some comparisons with BLAST (blastn). The genome se- 
quence and annotation of L. tarentolae Parrot-Tarll is 
available on TriTrypDB (www.tritrypdb.org). 

Orthologous group identification 

Putative L. tarentolae genes were compared to genes from 
other sequenced Leishmania species using the OrthoMCL 
2.0 web tool (30,31). Genes from L. major, L. infantum 
and L. braziliensis have already been clustered in groups of 
orthologs, which are publicly available in the OrthoMCL 
database (30). Each group of orthologous genes con- 
tains orthologs (related genes found in different species) 
and paralogs (related genes found within a species). 
Orthologous groups (OGs) may include whole gene fami- 
lies, subfamilies or single genes, depending on the extent 
and variability of the gene family. Genes qualified as 
absent from L. tarentolae were validated manually by 
searching the sequence of this gene in the original reads 
and by assembling the matching reads using CAP3 (32). 

Validation of copy number variation 

Reads were mapped onto the L. tarentolae scaffolds using 
the BWA software (33). The number of reads corres- 
ponding to each nucleotide position was calculated and 
the mean read coverage for each gene was estimated. 
For each OG, the read coverages of all genes were 
summed to estimate the total coverage for each OG. 
The total read coverage of OG were compared to the 
number of genes in the OG. Orthologous groups for 



which copy number variation were observed between 
L. tarentolae and the other species were manually in- 
spected to validate that total coverage of OG confirmed 
the number of genes. 

Southern blot hybridizations 

Promastigotes of L. tarentolae strain Parrot-Tarll, 
L. tarentolae strain S125, L. major strain Friedlin, 
L. infantum strain JPCM5 and L. braziliensis strain 
WHOM/BR/75/M2904 were grown at 25°C in SDM-79 
medium supplemented with 10% heat-inactivated FCS 
and 5 |ig/ml of hemin. Total DNA from each culture was 
prepared with DNAzole (Invitrogen), digested with Xhol 
restriction enzyme (New England Biolabs, Pickering, ON, 
Canada) and run-on standard agarose gels. Southern blot 
analyses with [a- 32 P]-dCTP labelled DNA probes were 
performed according to standard protocols (18). All 
probes were generated by polymerase chain reaction using 
primer sets listed in Supplementary Table S2. For each 
target, we generated a specific probe for each Leishmania 
species, which were co-hybridized on a blot of total 
digested genomic DNAs. Equal amount of DNA was 
layered for each strain and monitored by hybridization 
with the single copy PTR1 gene. 

Comparative genomic hybridization 

Leishmania whole-genome DNA microarrays used in 
CGH experiments, which included 8100 60-mer probes 
that were designed to hybridize all genes of L. major 5.2 
and L. infantum 3.0a, were obtained from Agilent 
Technologies (Mississauga, ON, Canada). Microarray 
platform details and probe sequences were deposited in 
the GEO database under the accession GPL 11330. 
Sample preparation, pre-hybridization and hybridization 
steps were performed as previously described (34). 
Normalization and data analysis were done in R with 
L1MMA 2.7.3. (35). Multiple testing correction was 
done using the false discovery rate (FDR) method and 
probes were considered significant when P < 0.05 and 
log-2 ratio > 2. The entire data set was deposited in 
GEO under the reference number series GSE27184. 



H 2 0 2 IC50 assay 

Promastigotes of L. major LV39, L. infantum 
MHOMMA#67#ITMAP-263 and L. tarentolae SI 25 
were grown at pH 7.0 and 25°C in SDM-79 medium sup- 
plemented with 10% fetal bovine serum, 5 (ig/ml hemin 
and 5uM biopterin (Sigma-Aldrich, St-Louis, MO). 
Leishmania promastigotes (5 x 10 6 ) were inoculated in 
5 ml medium and H 2 0 2 (Rougier Pharma, Mirabel, QC, 
Canada) was added at various concentration (100 uM, 
250 uM, 500 uM, 1000 uM, 2500 uM and 5000 uM). 
OD 600nm values were taken after 72 h. Each curve was per- 
formed in triplicate. 1C50 were calculated for each curve 
and the mean and 95% confidence intervals were 
calculated. 
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RESULTS 

Leishmania tarentolae genome sequencing 

The genome of L. tarentolae strain Parrot-Tarll was 
resolved using high-throughput sequencing technologies 
(Supplementary Table SI) to a 16-fold mean coverage 
and 23-fold peak coverage. The assembled L. tarentolae 
genome contains a total of 30 440 719 bases with 95.1 % of 
the GS-FLX and Titanium reads found in 773 scaffolds 
and the remaining 4.9% distributed in 2499 contigs. Reads 
obtained by Illumina sequencing were also incorporated in 
the final sequence to assist in the assembly. After de novo 
assembly, sequences of specific chromosomes were built 
using contigs and scaffolds based on their homology to 
L. major. Directed assembly on L. major allowed the 
mapping of 29862062 bases (98.1%) leaving 578657 
bases (1.9%) in 1315 small contigs. A summary of the 
sequencing statistics of the L. tarentolae genome and the 
other published Leishmania spp. genomes are presented in 
Table 1. CHEF analysis of the L. tarentolae genome 
showed 24-28 chromosomal bands (36). Similar analysis 
of the L. infantum genome showed a comparable number 
of band, which were further resolved in 36 chromosomes 
using hybridization of genome fragments (37). 

Leishmania tarentolae is syntenic with other 
Leishmania species 

De novo assembled contigs and scaffolds were generally 
syntenic with both L. major and L. infantum. Most differ- 
ences between L. tarentolae and the other species consisted 
in gene insertions or deletions distributed randomly or in 
tandem arrays throughout the genome. Full synteny maps 
comparing L. tarentolae to L. infantum and L. major are 
provided in Supplementary Figures S1-S36. Although 
L. tarentolae possesses similar percent identity with 
L. infantum and L. major (Table 1), its synteny is closer 
to the latter. For example on chromosome 28, a stretch of 
90 kb is syntenic between L. tarentolae and L. major but 
the gene ordering is different in L. infantum (Figure 1A). 
De novo assembled scaffolds are, in this case, long enough 
to confirm that L. tarentolae is more syntenic to L. major. 
Similar differences were found at the proximal region 
of chromosome 7 (Figure IB) and at the distal end 
of chromosome 35 (Figure 1C), both suggesting greater 
synteny to L. major than to L. infantum. Other 
loci where L. tarentolae de novo assembled scaffolds 
were syntenic to L. major but not to L. infantum can 
be found in Supplementary Figures S7, S9, SI 1-13, S32 
and S35. Overall, the synteny of L. tarentolae is closer 
to the L. (Leishmania) species than to the L. (Viannia) 
subgenus. Indeed, several loci are in common 
between L. tarentolae and L. major, although they are 
absent from L. braziliensis. Breaks in synteny are also 
more frequent with L. braziliensis than L. major (data 
not shown). 

Leishmania tarentolae gene content 

Genome annotation of L. tarentolae indicated a total of 
8201 putative protein-coding genes, a number similar to 
the other sequenced Leishmania species (Table 1). 



Annotation was performed by comparing the L. tarentolae 
genome to other Leishmania and Trypanosoma species 
along with ab initio annotation using the Augustus 
software trained for Leishmania gene detection (23-25). 
Given that assembly of repeated gene clusters is more dif- 
ficult, the count of genes found in L. tarentolae may be 
biased, especially for genes present in high copy number. 

Using the OrthoMCL web tool, the set of L. tarentolae 
putative genes was compared to the OrthoMCL database 
(www.orthomcl.org) in order to assign each gene to a 
group of orthologs (30). This allowed us to readily com- 
pare the gene content of the four sequenced Leishmania 
species, determine which genes are unique to a given 
species and calculate which OG of genes vary in copy 
number between the different species. These results were 
further confirmed by interspecies comparative genomics 
hybridization (CGH) microarrays, read depth analysis 
and, in selected cases, by Southern blots analyses. The 
gene content of L. tarentolae is highly similar to the 
three pathogenic Leishmania species sequenced, which 
contain a similar number of OG (Table 1). 

Figure 2 compares the gene content of L. tarentolae to 
other Leishmania species with emphasis on L. major. 
Overall, 7331 OG are shared by L. tarentolae and 
L. major. Of these, 7225 OG (7662 genes in L. tarentolae 
and 7845 genes in L. major) have a similar copy number in 
L. tarentolae and at least another Leishmania species, 
20 OG (32 genes in L. tarentolae and 131 genes in 
L. major) have a lower copy number in L. tarentolae 
than the three other species (Supplementary Table S3) 
and 86 OG (363 genes in L. tarentolae and 133 in 
L. major) have a higher copy number in L. tarentolae 
than the three other species (Supplementary Table S4). 
More than a third of the OGs with varying copy 
numbers have a putative function (see Figure 2 and 
Supplementary Tables S3 and S4). 

A total of 250 L. major genes distributed between 188 
OGs were found to be absent from L. tarentolae (Figure 2 
and Supplementary Table S5). Of these, 83 OG were 
shared by the three pathogenic species, 74 OG by 
L. major and L. infantum, 5 OG by L. major and 
L. braziliensis and 26 OG were unique to L. major 
(Figure 2). 

A total of 73 OG (95 genes) were unique to L. tarentolae 
(Supplementary Table S6). From these, 31 OG had 
orthologs in other non- Leishmania species, including 29 
in Trypanosoma spp. We also found 42 OG (65 genes) 
that were sequence orphans with no similarity to se- 
quences found in databases. 

Genes absent from L. tarentolae or present in lower copy 
number compared to the pathogenic Leishmania spp 

Leishmania tarentolae lacks several genes coding for pro- 
teins implicated in trafficking. Indeed, the pi/p2-adaptins 
(LmjFl 1.0990; LmjF36.5595), u-adaptin (LmjF31.3035) 
and the epsilon-adaptin (LmjF30.1545) (Figure 2 and 
Supplementary Table S5) were absent from L. tarentolae. 
Adaptins are involved in the formation of clathrin- 
associated adaptor protein (AP) complexes, which play a 
key role in the transport of proteins by regulating the 
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Figure 1. Synteny map of L. tarentolae (middle) compared to L. major (top) and L. infantum (bottom). Genes on chromosome tracks are grey. 
Leishmania tarentolae contig delimitation is indicated in black in the middle lane. Shade of synteny blocks is proportional to sequence identity, the 
darker the more similar are the sequences. The scale represents nucleotide position on the chromosome. (A) 5' region of chromosome 28. (B) 5' 
region of chromosome 7. (C) 3'-end of chromosome 35. 



formation of transport vesicles as well as cargo selection 
between the trans-Golgi network, endosomes, lysosomes 
and the plasma membrane (38,39). The calcium-dependent 
membrane binding proteins copines (LmjF28.1190) and 
Ras-like small GTP-binding proteins (LmjF36.1820), 
both involved in cell signalling and/or membrane 
trafficking/transport pathways and exocytosis, were also 
absent from L. tarentolae. The endosomal/lysosomal 
membrane-bound acid phosphatase (LmjF28.2650), po- 
tentially involved in intracellular trafficking (40), is also 



missing from L. tarentolae. Leishmania tarentolae also 
lacks the phosphatidylinositol 3-kinase 2 gene 
(LmjF14.0020) and the phosphatidylinositol-4-phosphate 
5-kinase gene (LmjF26.2495) (Figure 2 and 
Supplementary Table S5) whose activities were linked to 
a diverse set of key cellular functions, including intracel- 
lular trafficking (41). The Tubby protein 1, that has been 
reported to bind phosphatidylinositol 4,5-bisphosphate on 
the plasma membrane and facilitate macrophage phago- 
cytosis (42), is not present in L. tarentolae. 
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L. major and L. infantum orthologous groups absent from 
L. tarentolae (Supplementary Table S5): 

Delta-Amastin (3 OG) 
Tuzin like protein (2 OG) 

Phosphatidylinositol-4-phosphate 5-kinase, putative 
GTPase activator protein, putative 
Epsilon-adaptin (pseudogene), putative 
Phosphoglycan beta 1,2 arabinosyltransferase, putative 
Serine/threo nine-protein phosphatase PP1, putative 
Tartrate-sensitive acid phosphatase acp-3.2, putative 
ABC transporter-like protein 
Hydrophilic surface protein 2 
Lectin, putative 

Acetylornithine deacetylase-like protein 
Am inoa cyclase, putative 
Cytochrome b5-like protein 
Ef hand-like protein 
Fatty acid elongase, putative 
Glutamamyl carboxypeptidase (pseudogene), putative 
Map kinase (pseudogene) 
Pteridine transporter 
DNA polymerase kappa, putative 
Hypothetical proteins, conserved (8 OG) 
Hypothetical proteins, unknown function {43 OG) 



Orthologous groups with lower copy number in 
L. tarentolae (Supplementary Table S3): 

* Delta-Amastin 
*Tuzin-like protein 

* Histone H3 

* Histone H4 
Histone H2A, putative 

* Phosphoglycan beta 1 ,3 ga la ctosyltransf erase 
Phosphoprotein phosphatase, putative 
SHERP 

Hydrolase, alpha/beta fold family, putative 

Long chain fatty Acyl CoA synthetase, putative 

Sucrose hydrolase-like protein;beta-fructofuranosidase-like 

Uracil phosphoribosyltransferase, putative; uridine kinase-like 

60s ribosomal protein L35, putative 

Ama1 protein, putative 

Aminopeptidase P1, putative 

Hypothetical proteins, conserved (4 OG) 

Hypothetical protein, unknown function 



Orthologous groups unique to L. tarentolae 

(Supplementary Table S6): 

C3HC4 finger protein 

La domain-containing protein 

Malate dehydrogenase 

OmpA family protein 

Phosphoinositide kinase, putative 

Poly(A) polymerase, putative 

Trafficking protein particle complex subunit 2, like 

Zinc finger (Ran-binding) family protein 

Hypothetical proteins with Trypanosoma orthologs (9 OG) 

Hypothetical proteins with no orthologs (42 OG) 




Pathogenic Leishmania orthologous groups absent from 
L. tarentolae (Supplementary Table S5): 
Adaptin-related protein-like protein 
Beta-adaptin protein, putative 
Mu-adaptin 4, putative 
Copine l-like protein 

Phosphatidylinositol 3-kinase 2, putative 
Polynucleotide kinase 3'-phosphatase, putative 
Ras-like small GTPases, putative 
Nucleoside diphosphate kinase b 
Tub family protein-like protein 
Membrane-bound acid phosphatase, putative 
Calreticulin, putative 

ARP2/3 complex subunit, putative {3 OG) 

Microtubule-associated protein 

Kinesin, putative 

Delta-Amastin 

Gamma-Amastin 

Carboxypeptidase, putative 

Fatty acid elongase, puatative 

Aldose 1-epimerase, putative 

Glycosyl transferase, putative 

Subtilisin-like serine peptidase 

Pfpi/DJ-1-like protein, putative 

Tyrosine/dopa decarboxylase, putative 

Oxidoreductase-like protein (pseudogene) 

Hypothetical predicted transmembrane protein 

Hypothetical proteins, conserved (31 OG) 

Hypothetical proteins, unknown function {22 OG) 



Orthologous groups with higher copy number in L. tarentolae (Supplementary Table S4): 

* Major surface protease gp63, putative (leishmanolysin) 

* Hypothetical protein related to surface antigen protein PSA31C 
Inhibitor of cysteine peptidase 

Kinesin, putative (3 OG) 

Multidrug resistance protein, putative; p-glycoprote in-like protein 
Pterin-4-alpha-carbinolamine dehydrogenase 
Pteridine transporter 

Ubiquitin carboxyl-terminal hydrolase, putative 
Surface membrane protein gp46-like protein 
Beta-adaptin, putative 

Vacuolar-type proton translocating pyrophosphatase 1 , putative 

Vacuolar ATP synthase subunit, putative 

P-type ATPase, putative; vacuolaMype Ca2+-ATPase, putative 

Mitotubule-associated protein Gb4, putative 

Serine/threonine protein phosphatase catalytic subunit, putative 

Membrane associated protein-like protein 

NADH:flavin oxido reductase/N AD H oxidase, putative; n-ethylmaleimide reductase-like protein 
Glucokinase 1 -like protein 

Branched-chain amino acid aminotransferase, putative 

Amino acid permease, putative;amino acid transporter aATP1 1 , putative 

Amino acid transporter, putative 

Acetylornithine deacetylase-like protein; glutamamyl carboxypeptidase, putative 

cAMP specific phosphodiesterase, putative 

Acetyl-CoA synthetase, putative 

Citrate synthase, putative 

Endonuclease/Exonuclease/phosphatase, putative 

DEAH-box RNA helicase, putative 

Eukaryotic translation initiation factor 2 subunit, putative 

40S ribosomal protein S6, putative 

Prolyl-tRNA synthetase, putative 

Hypothetical proteins, conserved (42 OG) 

Hypothetical proteins, unknown function (8 OG) 



Figure 2. Differential distribution of genes and OGs of genes between L. tarentolae and L. major. a Gene counts referring to L. major. Gene counts 
referring to L. tarentolae. Lists include the description of OGs of genes that have differential distribution between L. tarentolae and the three 
sequenced Leishmania pathogenic species. Counts of genes within the different OG are in parenthesis. The complete list of genes and OGs for selected 
categories is shown in Supplementary Tables S3-S6. Asterisk indicates genes with the highest copy number variability in L. tarentolae. 
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Leishmania tarentolae lacks three of the seven subunits 
of the Arp2/3 complex, notably the p40/ARPCl 
(LmjF18.0920) p20/ARPC4 (LmjF02.0600) and pl5/ 
ARPC5 (LmjF05.0285) (Figure 2 and Supplementary 
Table S5). Only one Arp2/3 complex subunit coding 
gene was found and annotated in the L. tarentolae 
genome (not shown). The Arp2/3 complex is required 
for actin polymerization and reported to associate with 
several other cytoskeletal components, including microtu- 
bules (43,44). Interestingly, a number of microtubule- 
associated proteins (9 on chromosome 19) and a kinesin 
microtubule-associated protein (LmjF16.1580) are also 
missing from the L. tarentolae genome (Figure 2 and 
Supplementary Table S5). 

Genes reported to play a role in the resistance to oxi- 
dative stress are absent from L. tarentolae. These include a 
subtilisin (LmjF28.2380) belonging to the S8A subfamily 
of proteases, shown recently to process the terminal per- 
oxidases of the trypanothione reductase system in 
Leishmania (45); a Pfpi/DJ-l-like protein (LmjF35.3910) 
that defends cells against reactive oxygen species and 
mitochondrial damage (46); and a tyrosine/dopa decarb- 
oxylase (LmjF30.2500), a precursor of catechol- 
amines, which may act as scavengers of free radicals (47) 
(Figure 2 and Supplementary Table S5). Collectively, 
these data suggest that L. tarentolae may deal less effi- 
ciently with oxidants than L. major. Interestingly, 
L. tarentolae strains were found to be >3. 5-fold more sen- 
sitive to hydrogen peroxide than L. major and L. infantum. 
Indeed, the IC50 of L. tarentolae to H 2 0 2 was 152 ± 7 uM 
but 514 ± 115uM for L. major and 574 ± 121 uM for 
L. infantum. 

Gene content analysis also revealed differences in the 
glycoprotein content between L. tarentolae and the other 
pathogenic species. Genes involved in lipophosphoglycan 
(LPG) modifications were found to be either in lower copy 
number or absent from L. tarentolae. Chromosome 2 dis- 
played important differences between L. tarentolae and 



the two other sequenced Old World species in terms of 
phosphoglycan transferases (Figure 3). This region extends 
on two de novo assembled scaffolds in L. tarentolae, 
providing reliable information on this locus. We found 
that phosphoglycan (3 1,3 galactosyltransferase OG of 
genes necessary for the attachment of the side chain Gal 
residues to the LPG phosphoglycan (PG) repeats (48) are 
present in 6-8 copies in other Leishmania species but only 
in 2 copies in L. tarentolae (Supplementary Table S3). 
Within the same locus, phosphoglycan (3 1,2 arabinosyl- 
transferase (LmjF02.0180; LmjF02.0220), involved in the 
terminal capping oligosaccharide of LPG in L. major (49), 
is found in L. major and L. infantum but not in 
L. tarentolae (Supplementary Table S5). Leishmania 
tarentolae also lacks two glycosyltransferase genes 
(LmjF35.5250, LmjF29.21 10), enzymes that catalyze 
the transfer of a monosaccharide unit to a glycosyl 
acceptor molecule. Moreover, L. tarentolae lacks 
calreticulin (LmjF3 1.2600), an endoplasmic reticulum 
(ER) chaperone ensuring the proper folding and quality 
control of newly synthesized glycoproteins destined for 
secretion or cell surface expression (50) (Supplementary 
Table S5). 

Amastins, one of the largest family of surface proteins 
in Leishmania (17) shown to be expressed primarily in the 
intracellular stage of the parasite (51,52), are under- 
represented in L. tarentolae (Supplementary Table S3). 
Amastins, whose function has yet to be determined, are 
divided into four subfamilies based on their phylogeny 
and genomic positioning (53). We show that L. tarentolae 
contains the amastins of the a, (3 and y subfamilies that are 
also shared with Trypanosoma or Crithidia, but lacks 
all but two of the amastins of the delta subfamily 
(Figure 4A). This subfamily is expressed preferentially in 
the intracellular stage of the parasite (51). No new 
amastins were discovered in L. tarentolae. Interestingly, 
tuzins, which are often associated on the same chromo- 
somal locus with amastin genes in the pathogenic 
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Figure 3. Differential distribution of genes involved in lipophosphoglycan and phosphoglycan modification in L. tarentolae (middle) as compared to 
L. major (top) and L. infantum (bottom). Phosphoglycan |3 1,2 arabinosyltransferase are shaded with crosses, a first group of phosphoglycan p 1,3 
galactosyltransferase is shaded with thin lines and a second group with bold lines. The other genes on chromosome tracks are grey. Leishmania 
tarentolae contig delimitation is in black in the middle lane. Shade of synteny blocks is proportional to sequence identity, the darker the more similar 
are the sequences. The scale represents nucleotide position on chromosome 2. 
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Leishmania species, were less diverse in L. tarentolae with 
only 4 tuzin genes compared to 6 and 28 genes in 
L. infantum and L. major, respectively. The amastin- and 
tuzin-rich region within chromosome 8 contains only two 
copies of the tuzin gene in L. tarentolae (Figure 4B). On 
L. tarentolae chromosome 34, amastin and tuzin genes are 
present in single copy as opposed to the situation in 
L. major where they are organized in tandem, suggesting 
co-expression (Figure 4C). 

Gene family expansion in Leishmania tarentolae 

Genome comparative analysis between the L. tarentolae 
and the other sequenced Leishmania pathogenic species re- 
vealed that two surface-associated protein gene families 
were particularly enriched in the non-pathogenic 
L. tarentolae. The first family is leishmanolysin, a major 
surface zinc metalloprotease, also known as GP63; which 
is mostly, but not exclusively, expressed in the pro- 
mastigote stage of several Leishmania species (54-56). 
The GP63 OG in L. tarentolae is highly expanded 
with 49 putative GP63 genes as compared to 29 in 
L. braziliensis, 7 in L. infantum and 5 in L. major 
(Supplementary Table S4). Due to the high copy number 
and sequence variability, assembly of the GP63 gene 
family was limited and resulted in several partial gene se- 
quences. The density of read coverage for GP63 indicated 
portions of the gene that were well conserved in 
L. tarentolae while other regions were less represented, 
supporting a sequence diversity of GP63 in L. tarentolae 
(Figure 5A). Sequence diversity in the L. tarentolae GP63 
gene may affect its function. Indeed, a gelatin zymo- 
graphy assay showed no protease activity in L. tarentolae, 
in contrast to the pathogenic Leishmania species 
(Figure 6). 

The second surface protein gene family that was 
expanded in L. tarentolae is distantly related to the 
promastigote surface antigen proteins (PSA) located on 
chromosome 31. PSA-2 was originally described in 1989 
(57). The phylogenetic history of this gene family has 
recently been studied and members have been assigned to 
subfamilies based on their sequences and chromosomal 
locations (58). We observed that the L. tarentolae PSA 
genes are orthologous to PSA31C. There are 63 
paralogs in L. tarentolae, while only a single copy is pre- 
sent in L. major and L. infantum and two copies in 
L. braziliensis (Supplementary Table S4). Leishmania 
tarentolae PSA proteins seem to contain only the 
leucine-rich repeated motif (LRR) (Figure 5B), a domain 
often implicated in the binding to other proteins or 
glycolipids. This gene is proposed to be involved in 
host-pathogen interactions (58). 

Smaller copy number variations were observed for 
84 additional OGs, which are shown in Supplementary 
Table S4. 

Genes unique to Leishmania tarentolae 

We found 73 OG (95 genes) unique to L. tarentolae 
(Figure 2 and Supplementary Table S6). Of these 
groups, 31 (42%) had OrthoMCL orthologs in species 
other than L. infantum, L. major and L. braziliensis, 



including Trypanosoma spp. (26 groups; 36%). Putative 
functions could be ascribed to 10 OG (14%). These 
include a C3HC4 finger protein, a trafficking protein 
particle complex subunit 2-like protein, a La domain- 
containing protein, a malate dehydrogenase, an OmpA 
family protein, a phosphoinositide kinase, two surface 
protein GP63 OGs, a zinc finger (Ran-binding) family 
protein and a poly(A) polymerase, which is different from 
the poly(A) polymerases found in the pathogenic 
Leishmania. The remaining OG are sequence orphans 
with no orthologs found in the orthoMCL database. 

Experimental validation for gene content and 
copy number variations 

We validated experimentally some of the changes high- 
lighted from the genome sequence comparisons using 
comparative genomic hybridizations (CGH), Southern 
blot and read depth analysis. Full Leishmania genome 
microarrays designed for L. major and L. infantum were 
used to co-hybridize L. tarentolae with either the L. major 
or the L. infantum DNA to test whether we could identify 
genes specific to pathogenic species or genes with altered 
copy number. We found that 31-35% of the genes 
absent in L. tarentolae showed significantly higher 
signals with the DNA derived from the pathogenic 
species, which is statistically significant when compared 
to the overall ratio of 8% due to better hybridization of 
the DNAs derived from the pathogens (P < 0.001, one 
sample test for binomial proportion). Similarly, 20% of 
the L. tarentolae genes with lower copy number were 
validated by the CGH experiment (P < 0.001) confirm- 
ing in part the sequencing data. These correlations 
are explained by the design of the microarray that 
contains several probes (45%) that have more than 10 
mismatches with L. tarentolae, and by the fact that this 
microarray did not allow testing for the presence of 
L. tarentolae-specific genes. Considering these limitations, 
the CGH results allowed the validation of many differen- 
tially distributed genes between the Leishmania species. 
However, genes that are not validated by CGH may still 
be differentially distributed between the species. Genes 
confirmed by CGH are highlighted in Supplementary 
Tables S3 and S5. 

The correlation between sequence data and gene copy 
number was validated by Southern blot hybridizations 
for a few chosen genes (Figure 7). The delta-amastin sub- 
family is present in high copy number in L. major and 
L. infantum but not in L. tarentolae (Supplementary 
Table S3). The low copy number of delta-amastins in 
the Tarll-Parrot L. tarentolae strain predicted from the se- 
quencing data and their high copy number in the patho- 
genic species were confirmed by Southern blot 
hybridization (Figure 7A, lanes 3-5). Indeed the intense 
hybridization signals in the pathogenic species were due to 
the highly repetitive nature of a cluster of amastins giving 
rise to single restriction fragments (see Figure 7A). 
Interestingly, the copy number of delta-amastins was 
low not only in the Tarll-Parrot strain (Figure 7A, lane 
1) cultured for decades in the laboratory, but also in a 
more recent isolate from lizard with a limited history of 
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Figure 4. Analysis of amastin-coding genes in pathogenic and non-pathogenic Leishmania spp. (A) Phylogenetic tree of the amastin genes in 
L. tarentolae, L. major, L. infantum and L. braziliensis. Labels refer to amastin subfamilies. L. tarentolae amastins are in bold. The evolutionary 
history was inferred using the Neighbor-Joining method, with a bootstrap test of 500 replicates. Phylogenetic analyses were conducted in MEGA4 
(29). (B) Synteny of L. major (top), L. tarentolae (middle) and L. infantum (bottom) amastin/tuzin cluster located on chromosome 8. Amastins are 
marked with the letter A and tuzins with the letter T. (C) Synteny of L. major (top), L. tarentolae (middle) and L. infantum (bottom) amastin/tuzin 
cluster located on chromosome 34. 
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Figure 5. Density of read coverage for genes present in high copy number in L. tarentolae. For each position of the reference L. major genes, the 
number of corresponding reads were counted and plotted on the graph. Protein domains are indicated on the upper portion of each graph. 
(A) Leishmanolysin (GP63) gene; LmjFlO.0480 is used as a reference. (B) Promastigote surface antigen PSA31C gene; LmjF31.1440 is used as a 
reference. 



in vitro culture passages (Figure 7A, lane 2). As a control, 
we used a probe recognizing the proto-delta amastin gene, 
which according to the sequence data is present in similar 
copy numbers in all species. Southern blots have indeed 
validated this assumption (Figure 7B). 

In an independent set of experiments, Southern blot 
hybridizations corroborated sequencing data for the two 
genes involved in LPG side chain addition and LPG modi- 
fication, indicating that phosphoglycan p 1,3 galactosyl- 
transferase was present in lower copy number in 
L. tarentolae (Figure 7C) whereas phosphoglycan (3 1,2 
arabinosyltransferase was absent from this species 
(Figure 7D). Also, Southern blot analyses confirmed the 
expansion of the GP63 and PSA31C protein gene families 
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Figure 6. Protease activity of GP63 in six Leishmania species. 
(A) Western blot using monoclonal antibody targetting GP63 shows 
the quantity of this protease in each sample. (B) Gelatin zymography 
assay determining the protease activity of GP63. No signal was 
observed for L. tarentolae, suggesting the absence of GP63 activity in 
this species. Lane 1, L. mexicana; lane 2, L. major; lane 3, L. donovani; 
lane 4, L. infantum; lane 5, L. amazonensis; and lane 6, L. tarentolae. 
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Figure 7. Southern blot hybridization of genes whose copy number varies between L. tarentolae and the other Leishmania pathogenic species. Total 
genomic DNA of Leishmania isolates was digested with Xhol, run on agarose gels, blotted and hybridized with a combination of PCR-specific probes 
derived from each species (see Supplementary Table S2 for primer sequences and probe details). (A) Delta-amastin; (B) Proto-delta-amastin (shared 
by the four species) as a control; (C) Phosphoglycan p 1,3 galactosyltransferase; (D) Phosphoglycan p 1,2 arabinosyltransferase; (E) GP63; and (F) 
Surface antigen protein PSA31C. DNA loading was estimated by hybridizing the same blot with the single copy gene PTR1 (shown in the lower 
portion of each panel). Lanes 1, L. tarentolae Parrott-Tarll; lane 2, L. tarentolae S125; lane 3, L. major Friedlin; lane 4, L. infantum JPCM5; lane 5, 
L. braziliensis WHOM/BR/75/M2904. 



in L. tarentolae as deduced from the sequencing data 
(Figure 7E and 7F). The heterogeneity in the hybridizing 
fragments gives some credence to the size diversity of 
GP63 (Figure 5A) and PSA31C (Figure 5B) genes, as sug- 
gested by the sequencing data. Moreover, both gene 
families have a high read coverage, 3413 reads per nucleo- 
tide for GP63 and 7280 reads per nucleotide for PSA31C, 
which supports the idea that these genes are in high 
copy number (Supplementary Table S4). Read depth 
analysis also supports the higher copy number of genes 
listed in Supplementary Table S4 and the lower copy 
number of genes listed in Supplementary Table S3. The 
mean read coverage for single-copy genes was 22.4 read/ 
nucleotide. 



Orthologous groups of genes varying in copy number 
or missing from L. tarentolae are developmentally 
regulated in pathogenic Leishmania species 

We show here that some genes preferentially expressed in 
the amastigote stage, such as the amastins, are present in 
lower copy number or are absent from L. tarentolae, while 
genes reputed to have a higher expression in the insect 
stage (e.g. GP63 or PSA31C) are represented in higher 
copy number in L. tarentolae (Figure 7E and 7F). We 
performed additional bioinformatics analyses to test 
whether OGs of genes with a variable copy number 
between L. tarentolae and the pathogenic species were 
stage-regulated in the later. Specific expression of these 



1142 Nucleic Acids Research, 2012, Vol. 40, No. 3 



groups to Leishmanial life cycle stages was based on pub- 
lished transcriptomics and proteomics studies (59-71). 
Overall, 23% of the 7694 OG were associated to one 
or the other developmental stage of the parasite with 
890 OG associated to the promastigote stage and 1013 
OG associated to the amastigote stage (Table 2). 
Association to either developmental stage is indicated in 
Supplementary Tables S3-S5. Comparison of each 
category to overall distribution was performed using the 
Fisher's exact test. Interestingly, 70% (16/23) of the OG 
with a lower copy number and ~34% (29/86) of the OG 
with a higher copy number in L. tarentolae were differen- 
tially expressed in either life stage of the pathogenic 
Leishmania species (Table 2). Remarkably, OG absent 
from L. tarentolae were mainly associated to the 
amastigote life stage in other Leishmania (P < 0.001). 
Indeed, of the 66 OGs present in pathogenic Leishmania 
spp. but absent from L. tarentolae that were associated to 
one or the other developmental stage of the parasite, 18 
genes were linked to the promastigote life stage, 52 to the 
amastigote stage and 4 to both life stages (Table 2). 

Examples of genes that were absent from L. tarentolae 
but preferentially expressed in the amastigote stage of 
other Leishmania species are the delta-amastins and the 
hydrophilic surface protein gene family HASPA1, 
HASPA2, HASPB1, HASPB2 (Supplementary 
Table S5), known to be preferentially expressed in L. 
major, L. infantum and L. donovani amastigotes as 
determined by DNA microarray studies (68,69). This list 
also includes a 3,2-trans-enoyl-CoA isomerase, an ABC 
transporter-like protein, a class I nuclease-like protein, a 
D-isomer-specific 2-hydroxyacid dehydrogenase-protein, 
an Ef hand-like protein, a GTPase activator protein, a 
pteridine transporter and a tyrosine/dopa decarboxylase. 
The remaining amastigote OGs absent from L. tarentolae 
encode hypothetical proteins (Supplementary Table S5). 

DISCUSSION 

We have used next-generation DNA sequencing tech- 
nologies to obtain a high-quality draft (72) of the genome 
sequence of L. tarentolae, the first non-human trypano- 
somatid pathogen sequenced. Genome sequence analysis 
revealed that L. tarentolae is syntenic to the three- 
sequenced pathogenic Leishmania species and that >90% 
of the approximately 8200 genes are shared by all the spe- 
cies. However, a number of genes shown to be either 
important for pathogenesis or preferentially expressed in 
the intracellular parasitic stage in the pathogenic species 



or more relevant to the insect stage were either absent 
from L. tarentolae or present at variable copy number, 
supporting the hypothesis that some of these genes may 
be responsible for the reduced capacity of L. tarentolae 
to live as an intracellular parasite and its diminished 
pathogenic potential to humans. 

Among the genes absent from L. tarentolae, there was a 
significant bias for those expressed specifically in the 
amastigote intracellular stage in the pathogenic species 
(Table 2). Most of these genes correspond to hypothetical 
conserved or hypothetical proteins of unknown function 
(Supplementary Table S5) further highlighting the gaps in 
our understanding of the biology of intracellular parasites. 
Still, well-studied genes, such as the amastin gene family, 
especially the delta-amastins, are in low copy number in 
L. tarentolae (only 2 found, see Figure 4A) whereas high 
numbers of these genes (12-25 members) are found in the 
pathogenic species. The delta subfamily contains a diverse 
clade that is exclusive to Leishmania with one locus 
(proto-delta-amastins) found in other trypanosomatids 
(53). Although the function of amastins is still unknown, 
most of the delta-amastins undergo a stage-specific regu- 
lation and are preferentially expressed in amastigotes 
(51,52,68,73,74) and few in infective metacyclics (59). 
Given their specific expression in amastigotes, it has 
been argued that their expansion should be viewed as 
the adaptation of the parasite to a novel life stage after 
the acquisition of vertebrate parasitism (53). Leishmania 
tarentolae also lacks tuzins, conserved transmembrane 
proteins with unknown function, which are often contigu- 
ous with delta-amastins. Moreover, the majority of 
L. tarentolae genes displaying high variation in their 
copy number (e.g. GP63, PSA31C, phosphoglycan (3 1,3 
galactosyltransferase, amal, etc) are developmentally 
regulated in the pathogenic Leishmania. 

Two of the most studied abundant surface constituents 
of Leishmania promastigotes are the GPI-anchored mol- 
ecules lipophosphoglycan (LPG) (75) and GP63 zinc 
metallo-protease (76). In contrast to the mammalian para- 
sites, lizard Leishmania such as L. tarentolae and L. adleri 
seem to lack LPG (77). Sequencing data did not reveal any 
specific gene, absent or mutated, that could explain the 
lack of LPG. LPG has been implicated in many steps 
required for the establishment of initial macrophage infec- 
tion, notably in the inhibition of phagolysosome biogen- 
esis and in resistance to oxidants (75), but also in the 
development of innate immunity (78,79). Indeed, LPG- 
deficient L. major mutants showed attenuated virulence 
(80,81) although this was not the case for L. mexicana 



Table 2. Association of OGs of genes in L. tarentolae with differential expression in promastigote and/or amastigote life 
stages in pathogenic Leishmania species 

Promastigote Amastigote P-value* 

All OGs 7694 890 1013 NA 

Higher copy number in L. tarentolae 86 20 20 0.75 

Lower copy number in L. tarentolae 23 11 11 0.83 

OGs absent from L. tarentolae 278 18 52 <0.001 



*Promastigote versus amastigote comparison using two-tailed Fisher's exact test. 
NA: Not Applicable. 
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LPG-deficient mutants (82). In addition to LPG, 
L. tarentolae lacks enzymes that are involved in LPG 
modification like the phosphoglycan p 1,2 arabinosyl- 
transferase, which is involved in the terminal capping 
of LPG in L. major (49), and has less copies of the p 1,3 
galactosyltransferase gene, which adds the p 1,3 gala- 
ctoside to the repeating PG of LPG (48) (Figure 3). 
These modifications play a role in Leishmania— sand fly 
interactions (48,49,83). 

The major surface protease GP63, a well known viru- 
lence factor implicated in phagocytosis, parasite evasion 
of complement-mediated lysis, and induction of the host 
immune response (76,84), is highly expanded in L. 
tarentolae, as confirmed by Southern blot (Figure 7E) 
and read coverage analyses (Figure 5A). In the sand fly, 
GP63 is thought to be implicated in nutrient acquisition 
and attachment of promastigotes to the gut wall (76,85). 
Expansion of GP63 genes in L. tarentolae may thus facili- 
tate nutrient acquisition, and in the absence of LPG, GP63 
may be important for the parasite to maintain its attach- 
ment to the midgut. More recently, it was shown that 
during Lra/7ma«;'a-macrophage interaction, GP63 is 
internalized by macrophages, where it interacts and cleaves 
several intracellular macrophage proteins, including actin 
cytoskeleton regulators, protein tyrosine phosphatases, 
and transcription factors (86-89). Collectively, these 
cleavage-dependent activation events lead to the down- 
regulation of IFN-gamma signalling and downstream 
macrophage activation, including NO production (90). 
Interestingly, L. tarentolae was shown to lack protease 
activity (91), as also validated here by gelatin zymography 
(Figure 6). While GP63 genes were difficult to assemble, 
read analysis showed that some portions of the gene, es- 
pecially the N-terminal and C-terminal regions, were 
underrepresented or had increased sequence variability 
while other portions were conserved (Figure 5A). It has 
been reported previously that the C-terminus of GP63 
contains a GPI membrane anchor (92). Moreover, it has 
been shown recently that deletion of 180-211 amino acids 
from the C-terminal domain of GP63 resulted in 50% loss 
of its catalytic activity (93). Our data (Figures 5A and 6) 
and published reports (86,91) support the possibility that 
the high sequence variability of the L. tarentolae GP63 
gene may affect GP63 protease's activity. 

In addition to GP63 genes, L. tarentolae has expanded 
the promastigote surface antigen PSA31C for which only 
a single copy is present in the pathogenic Leishmania (58). 
The PSA proteins are part of eight subfamilies that have 
usually a signal peptide, a cysteine-rich region, leucine-rich 
repeats, a threonine/serine-rich region and a domain 
possibly acting as a GPI anchor (58). PSA were found 
to be either membrane-bound, secreted or soluble but little 
is known about their function (58). The expression of 
some PSA genes was found to be increased in metacyclic 
parasites (94). In L. tarentolae, the only recognizable 
feature in the PSA31C subfamily is the leucine-rich 
repeats domain (Figure 5B). The expansion of this sub- 
family in L. tarentolae (Figure 7F) is striking and warrants 
further analysis. One hypothesis is that the PSA31C gene 
has expanded in lizard parasites to facilitate promastigote 
survival either in the insect vector or in the lizard. 



Another interesting feature of the L. tarentolae genome 
is the lack of a number of genes encoding key functions of 
vesicular protein trafficking. In fact, L. tarentolae lacks the 
large adaptin subunits (P1/P2) and the medium-sized- 
u-adaptins that are part of the heterotetrameric AP-1 
complex, which is involved in the formation of clathrin- 
coated vesicles (CCVs) mediating protein transport be- 
tween the trans-Golgi network and endosomes (38). 
Leishmania mutants lacking AP- 1 subunits showed signifi- 
cant defects in Golgi structure, endocytosis or exocytic 
transport and displayed reduced rates of endosome- 
to-lysosome transport (95). Moreover, it was shown that 
Sigma 1- andu 1-adaptin homologues of L. mexicana are 
required for parasite survival in the infected host (96). In 
addition, the absence of membrane-bound acid phosphat- 
ase localizing to endosomal/lysosomal compartments (40) 
and of copines, soluble, calcium-dependent membrane- 
binding proteins with C2 domains known to be involved 
in cell signalling and/or membrane trafficking pathways 
and exocytosis (97) further suggests that L. tarentolae 
may have a deficient vesicular protein transport. In line 
with this, L. tarentolae lacks members of the Ras-like 
small GTP-binding proteins that have emerged as master 
regulators of cellular membrane transport by interacting 
with C2 protein domains (98). Also, L. tarentolae lacks 
three of the seven subunits of the Arp2/3 complex, 
which has been shown to function in cellular processes 
ranging from cell motility, cytokinesis and endocytosis 
to trafficking and cell-cell communication (44). Arp2/3 
complex is a key player in initiating actin polymerization at 
sites of endocytosis (99). Since in the absence of ARPC1, 
ARPC4 and ARPC5, the Arp2/3 complex cannot be 
formed (44,100), it is possible that L. tarentolae shows 
defects in endocytosis, which may impact the capacity of 
the parasite to communicate efficiently with the intracel- 
lular environment in the macrophage phagolysosome. 

Some other genes known to play a role in virulence were 
also missing from L. tarentolae. Indeed, L. tarentolae lacks 
one of the two subtilisin (SUB) proteases found to process 
the terminal peroxidases of the trypanothione reductase 
system (45). Subtilisin promotes survival of Leishmania 
amastigotes by serving as a maturase for the trypano- 
thione reductase system, which is essential to maintain 
redox homeostasis in the host macrophage and to 
protect the parasite against oxidative damage (101). 
Interestingly, SfZB-deficient Leishmania showed increased 
sensitivity to hydroperoxides and reduced viability in 
mouse infection models (45). In addition, L. tarentolae 
lacks DJ-1, a multifunctional oxidative stress response 
protein that defends cells against reactive oxygen species 
and mitochondrial damage (46). This is consistent with the 
observed decrease of L. tarentolae survival in the presence 
of H 2 0 2 . 

Here, we provide a high-quality draft of the genome 
sequence of the lizard L. tarentolae using next-generation 
sequencing (NGS) technologies. Limitations of NGS 
technologies are well known (102) and similar problems 
were observed with the L. tarentolae genome assembly. 
The size of the L. tarentolae genome is somewhat smaller 
(~5%) than that of the other sequenced Leishmania spp. 
due to collapse of regions with identical nucleotide 
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sequence. Also, repeated genes such as GP63 or PSA31C 
had a tendency to be fragmented in the assembly. 
However, these problems were attenuated by read 
number analysis. Homopolymers are an important cause 
of sequencing errors when using the 454 platform and can 
lead to frame shifts that make it difficult to differentiate 
actual sequencing errors from pseudogenes. In most cases, 
comparison of these specific regions to reads obtained by 
Illumina sequencing validated the presence or absence of 
genes. Most importantly, the current sequence draft 
provides additional research avenues to investigate 
Leishmania pathogenesis. The absence of LPG, acid phos- 
phatase and delta-amastin glycoproteins on the surface of 
L. tarentolae combined to GP63 lacking most likely its 
protease activity could lead to more vulnerable parasites, 
which may be more sensitive to complement-mediated 
lysis with a diminished capacity of survival within the 
host macrophage. In addition, a possible defect in the ves- 
icular trafficking of glycoproteins, plasma-membrane 
proteins and secreted proteins may have important conse- 
quences on the ability of L. tarentolae to survive as an 
intracellular parasite. Furthermore, L. tarentolae lacks 
several proteins of unknown functions (including the 
delta-amastins) shown to be expressed preferentially in 
the amastigote stage. The absence of these proteins may 
explain why L. tarentolae cannot replicate efficiently in 
mammalian macrophages. In summary, this study 
provides insights into the reconstruction of the steps 
leading to increase adaptation for intracellular parasitism 
and suggests a number of hypothesis that can be experi- 
mentally tested. 

ACCESSION NUMBER 

The genome sequence of L. tarentolae has been deposited 
in tritrypdb.org. Upon inclusion, the sequence will be 
transferred to Genbank. 
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